Let's assume you are a biologist and you did a lot of research lately on the model organism Chlamydomonas reinhardtii.You made the big discovery that the production of the heat shock factor, HSF1, is directly proportional to the concentration of $\textrm {Na}^{+}$ in the growth medium. You assume that is the case because HSF1 also regulates the salt stress response of Chlamydomonas reinhardtii. You plan to do more research on this subject to find out in what way the increased HSF1 concentration changes the expression of the genes of Chlamydomonas reinhardtii. The most optimal approach for this problem would be to do RNA-seq with Chlamydomonas reinhardtii. However, your lab group recently did a lot of very expensive experiments and it is unlikely that enough money is available to do another RNA-seq experiment. Especially since you are only interested in the expression change of a handful of genes at the moment. However, your group recently did a proteomics experiment.
While talking about that problem in one of your group meetings, one of your co-workers comes up with the idea to instead model this system to computationally determine the impact of increased HSF1 concentration in the cell. Afterwards you can validate the model by looking at the protein concentration that got determined by the proteomics experiment.
Following the plot above, most would be unable to just start modelling a complex systems. Thus, this blog post tries to give a rundown on how to break down the system into smaller subproblems, how to model these subproblems, and finally how to solve the complex problem.
The proposed problem is the influence of HSF1 on the gene expression of a subset of genes after HSF1 was upregulated by the extracellular concentration of $\textrm {Na}^{+}$. The smallest subproblem that can be found in this list is the transcription of a gene as it is a one step process where mRNA is produced from a gen. Or in other words the concentration of an mRNA Y is dependent on the transcription rate of a gen X. Furthermore, the actual mRNA concentration is also dependent on the time that the gen produces the mRNA. These simple observations can be formulated as differential equation
For ease of use it is common to represent rates or expression (or later of production) with Greek letters to make the equations easier to read. The expression rate gets named $\beta$ from here on. $$ \frac {dmRNA} {dt} = \beta$$ Speaking in more mathematical terms the shown differential equation describes how the dependent variable mRNA concentration changes with respect to the independent variable time. This relationship is described by the function $\beta$ which is the transcription rate of the gene.
Solving a differential equation by hand is a very complicated process. Thus, the solution to differential equations are mostly approximated. There are different approximation methods with the most commonly known being the Euler method.
F# offers a open source library called FsODE which is able to solve differential equations. FsODE can be used like every other F# library by importing it and opening it.
#r "nuget: FsODE, 0.0.2"
open FsODE
The next step in solving a differential equation is to choose the method with which the model should get solved. The default option is called RK546M and is based on the Runge-Kutta Method.
let modelContext = //OdeContext()
OdeSolverMethod.RK546M //RK547M
|> OdeContext
The following step is to implement the differential equation. This is done by first generating a model which will contain the differential equation. The model is named dmRNA_dt in the following code block to represent that it models the change of mRNA concentration in respect to time.
Three parts are required for the model to work. The first is the definition of the dependent variable, which means the variable which values change in dependence on the independent variable, which is the mRA concentration. The second is the differential equation which describes this relationship. The third is an output array that takes the differential equation. It should be noted that the dependent variable and the differential equation must have different names.
//Constants
let beta = 2. // mRNA transcription constant
//the differential equation as model
let dmRNA_dt : Model =
// P is just the variable name, can also be named mRNA,Protein, etc.
fun P t ->
// definition of the dependent variable
let mRNA = P[0]
// differntial equation
let mRNA' = beta
//output array
[| mRNA' |]
Afterwards the start conditions need to be specified, which in this case is the start concentration of mRNA.
//Startconcentration of the mRNA
let P0 = [|
0.0
|]
Solving the model is then done by simulating the model. Which means giving the solving context the start value of the independent variable, the start condition of the dependent variable, and the model. The last parameter to be specified afterwards is the precision of the simulation.
// function to simulate the model
let simulation =
modelContext.OdeInt(
0., // Start time of the simulation
P0, // Start parameter of the variable that gets modeled
dmRNA_dt //the model
)
|> SolPoints.take 10 // Number of timepoints to model
|> SolPoints.memorize //stors the calculated values
Running the simulation is then done by defining the simulation type and specifying for which differential equation the simulation should get solved. The specification is done by calling the index of the dependent variable that were defined in the model. The index starts with 1. Thus, calling Sol.Points 1 specifies the dependent variable mRNA (defined above as let mRNA = P[0]).
let mRNA = SolPoints.toPoints 1 simulation
The complete simulation can be visualized afterwards.
#r "nuget: Plotly.NET.Interactive, 3.0.2"
open Plotly.NET
Loading extensions from `C:\Users\felix\.nuget\packages\plotly.net.interactive\3.0.2\interactive-extensions\dotnet\Plotly.NET.Interactive.dll`
mRNA
|> Chart.Spline
|> Chart.withTraceInfo("mRNA concentration ")
|> Chart.withXAxisStyle("Time [s]")
|> Chart.withYAxisStyle("mRNA concentration [μM]")
While this simple model describes the production of mRNA in the cell, it does not represent the biological reality. The mRNA concentration in the cell does not increase all the time as mRNA in the cell gets degraded. This makes a lot of sense, as a linear increase of the mRNA concentration as shown in the diagram above would eat resources that are needed for other tasks like metabolism. Thus, the degradation is dependent on the mRNA concentration in the cell, as the cell needs to degrade more mRNA if the mRNA concentration is high, and less if the mRNA concentration is low. Using $\gamma$ as degradation constant it is possible to extend the previous differential equation to $$ \frac {dmRNA} {dt} = \beta - \gamma * \textrm{[mRNA]}$$ Since it is a degradation it needs to be subtracted from the production term. Implementing this in F# is identical to the steps before, the only necessary change is the modification of the differential equation.
//constants
let beta = 0.5 //mRNA expression
let gamma = 0.1 //mRNA degradtation constant
let dmRNA_dt2 : Model =
fun P t ->
let mRNA = P[0]
let mRNA' = beta - gamma* mRNA
[| mRNA' |]
let simulation2 =
modelContext.OdeInt(
0., // Start time of the simulation
P0, // Start parameter of the variable that gets modeled
dmRNA_dt2 //the model
)
|> SolPoints.take 10 // Number of timepoints to model
|> SolPoints.memorize //stors the calculated values
let mRNA = SolPoints.toPoints 1 simulation2
mRNA
|> Chart.Spline
|> Chart.withTraceInfo("mRNA")
|> Chart.withXAxisStyle("Time [s]")
|> Chart.withYAxisStyle("mRNA concentration [μM]")
Extending the simple model according to the influence of HSF1 is a straight forward approach. Assuming HSF1 works as activator the production of mRNA of the affected gens should increase. This can be reflected by inserting a term to the transcription rate of these gens. This term needs to reflect that a higher HSF1 concentration increases the mRNA concentration, whereas a low concentration would keep the mRNA concentration low. Additionally, the term should include a parameter which determines how tightly HSF1 binds to the promotor region of the gen, this parameter is called binding constant. A strong binding constant would mean HSF1 stays on the promotor for a long time, so a lower overall HSF1 concentration is needed to get the desired effect, whereas a weak binding constant reflects that HSF1 can dissociate easily from the promotor region, so a high concentration of HSF1 is needed to get the desired effect. In practise a more common parameter is the dissociation constant, which is the reciprocal value of the binding constant. This leads to $$ \frac {dProteinconcentration} {dt} = \beta* \frac {1}{1+\frac{K_{D}}{A}} - \gamma*Proteinconcentration$$ with A as concentration of the activator (HSF1) and $K_{D}$ as dissociation constant.
Side note: The sharp minded reader might recognise that the term that got inserted does not increase the overall transcription of the gene but rather decreases it as the inserted term can never get smaller than 0. However, the way to think about this is that nature can use a higher production constant $\beta$ in a regulated gene expression process compared to an unregulated one, as the regulated gene expression will only be active when it is needed and not always. So, the real increase in production is hidden in $\beta$ and not the activator term.
//constants
let beta = 2.0 //mRNA production
let gamma = 0.1 //mRNA degradtion
let K_d = 1.0 //dissociation
let A = 0.5 //activator concentration
let dP_dt3 : Model =
fun P t ->
let mRNA = P[0]
let mRNA' = beta * 1./(1.+(K_d/A)) - (gamma * mRNA)
[| mRNA' |]
let P0 = [|
0.0
|]
let simulation3 =
modelContext.OdeInt(
0.,
P0,
dP_dt3
)
|> SolPoints.take 10
|> SolPoints.memorize
[
mRNA
|> Chart.Spline
|> Chart.withTraceInfo("unregulated");
let mRNARegulatted = SolPoints.toPoints 1 simulation3
mRNARegulatted
|> Chart.Spline
|> Chart.withTraceInfo("regulated");
]
|> Chart.combine
|> Chart.withXAxisStyle("Time", MinMax =(0,40))
|> Chart.withYAxisStyle("Proteinconcentration", MinMax =(0,10))
The mRNA that gets produced from a gen gets further processed in the cell by getting translated into proteins. Thus, a second differential equation is needed that describes the behavior of the protein concentration. This second differential equation must be dependent on the mRNA concentration, as a high mRNA concentration will lead to a higher protein concentration compared to a low mRNA concentration. Additionally, the protein also needs to be degraded depending on the number of proteins in the cell. This leads to the following differential equations
$$ \frac {dmRNAconcentration} {dt} = \beta_{1}* \frac {1}{1+ \frac {K_{D}}{A}} - \gamma_{1}*mRNAconcentration$$$$ \frac {dProteinconcentration} {dt} = \beta_{2}* mRNAconcentration - \gamma_{2}*Proteinconcentration$$where both have different production and degradation constants and the protein concentration is directly dependent on the mRNA concentration.
//Constants
let beta1 = 1.0 //mRNA prouction
let gamma1 = 0.8 //mRNA degradation
let beta2 = 0.5 //protein production
let gamma2 = 0.35 //protein degradation
let K_d = 5. //dissociation
let A = 0.5 //acivator concentration
//model
let dP_dtfinal : Model =
fun P t ->
let mRNAconcentratoin = P[0]
let Proteinconcentration = P[1]
let mRNAconcentratoin' = beta * 1./(1.+(K_d/A)) - (gamma * mRNAconcentratoin)
let Proteinconcentration' = beta2 * mRNAconcentratoin - (gamma2 * Proteinconcentration)
[| mRNAconcentratoin';Proteinconcentration' |]
//Start concentrations
let P0 = [|
0.0
0.0
|]
let simulationFinal =
modelContext.OdeInt(
0.,
P0,
dP_dtfinal
)
|> SolPoints.take 50
|> SolPoints.memorize
[
let mRNA = SolPoints.toPoints 1 simulationFinal
mRNA
|> Chart.Spline
|> Chart.withTraceInfo("mRNA")
let proteine = SolPoints.toPoints 2 simulationFinal
proteine
|> Chart.Spline
|> Chart.withTraceInfo("Protein")
]
|> Chart.combine
|> Chart.withXAxisStyle("Time [s]", MinMax=(0,100))
|> Chart.withYAxisStyle("Concentration [μM]", MinMax=(0, 10))
The original model had the following structure:
A "complete" model of this schema can ignore the receptor as it does not affect the differential equations if Na+ directly or indirectly effects the gene expression of hsf1. Thus, the following differential equations can be used to model this system
$$ \frac {dmRNAconcentration_{HSF1}} {dt} = \beta_{1}* \frac {1}{1+ \frac {K_{D Na_{+}}}{A_{Na^{+}}}} - \gamma_{1}*mRNAconcentration_{HSF1}$$$$ \frac {dProteinconcentration_{HSF1}} {dt} = \beta_{2}* mRNAconcentration_{HSF1} - \gamma_{2}*Proteinconcentration_{HSF1}$$$$ \frac {dmRNAconcentration_{SSR}} {dt} = \beta_{3}* \frac {1}{1+ \frac {K_{D HSF1}}{A_{HSF1}}} - \gamma_{3}*mRNAconcentration_{SSR}$$$$ \frac {dProteinconcentration_{SSR}} {dt} = \beta_{4}* mRNAconcentration_{SSR} - \gamma_{4}*Proteinconcentration_{SSR}$$$$ \frac {dNa^{+}} {dt}= - \gamma_{5}*Proteinconcentration_{SSR} $$with SSR representing proteins that counteract the salt stress that is sensed via a high $Na^{+}$ concentration. These differential equations are just two gene expression systems put together. Thus, only the number of equations increased but the actual complexity of every single equation stayed the same.
//Constants
let beta1 = 1.0 //mRNA production of hsf
let gamma1 = 0.8 //mRNA degradation of the hsf
let beta2 = 0.5 //protein production of HSF
let gamma2 = 0.35 //protein production of HSF
let beta3 = 2.0 //mRNA production of ssr
let gamma3 = 1.0 //mRNA degradation of ssr
let beta4 = 2.0 //protein production of SSR
let gamma4 = 0.8 //protein degradation of SSR
let gamma5 = 0.05 //removal of Na+
let K_d_na = 1.0 //dissociation of Na
let K_d_hsf = 0.8 //dissociation of HSF
//model
let modelFinal : Model =
fun P t ->
let mRNAcHSF = P[0]
let proteincHSF = P[1]
let mRNAcSSR = P[2]
let proteincSSR = P[3]
let NA = P[4]
let mRNAcHSF' = beta1 /(1.+(K_d_na/NA)) - (gamma1 * mRNAcHSF)
let proteincHSF' = beta2 * mRNAcHSF - (gamma2 * proteincHSF)
let mRNAcSSR' = beta3 /(1.+(K_d_hsf/proteincHSF)) - (gamma3 * mRNAcSSR)
let proteincSSR'= beta4 * mRNAcSSR - (gamma4 * proteincSSR)
let NA' = -gamma5 *proteincSSR
[| mRNAcHSF';proteincHSF';mRNAcSSR';proteincSSR';NA'|]
//Start concentrations
let P0 = [|
0.0
0.01
0.0
0.0
10.
|]
let simulationFinal =
modelContext.OdeInt(
0.,
P0,
modelFinal
)
|> SolPoints.take 50
|> SolPoints.memorize
[
let mRNAHSF = SolPoints.toPoints 1 simulationFinal
mRNAHSF
|> Chart.Spline
|> Chart.withTraceInfo("mRNA HSF1")
let proteinHSF = SolPoints.toPoints 2 simulationFinal
proteinHSF
|> Chart.Spline
|> Chart.withTraceInfo("Protein HSF1")
let mRNASSR = SolPoints.toPoints 3 simulationFinal
mRNASSR
|> Chart.Spline
|> Chart.withTraceInfo("mRNA SSR")
let proteinSSR = SolPoints.toPoints 4 simulationFinal
proteinSSR
|> Chart.Spline
|> Chart.withTraceInfo("Protein SSR")
let naion = SolPoints.toPoints 5 simulationFinal
naion
|> Chart.Spline
|> Chart.withTraceInfo("Natrium")
]
|> Chart.combine
|> Chart.withXAxisStyle("Time [s]", MinMax=(0,62))
|> Chart.withYAxisStyle("Concentration [μM]", MinMax=(0, 10))
Coming back to the reality of the biological world. It is very unlikely that the above model actually models the way HSF1 regulates other genes and proteins. A linear model is very slow and unprecise. A more likely model is one that factors in a ultrasensitive HSF1 regulation. The same is likely for the regulation of the hsf1 gene, which should also be modelled as ultrasensitive gene. This can be done by implementing an exponent to the activator part of the function.
$$ \frac {dmRNAconcentration_{HSF}} {dt} = \beta_{3}* \frac {1}{1+ (\frac {K_{DNa^{+}}}{A_{Na^{+}}})^{n1}} - \gamma_{3}*mRNAconcentration_{HSF}$$$$ \frac {dmRNAconcentration_{SSR}} {dt} = \beta_{3}* \frac {1}{1+ (\frac {K_{D HSF1}}{A_{HSF1}})^{n2}} - \gamma_{3}*mRNAconcentration_{SSR}$$The exponent n describes how strong the ultrasensitive regulation is. One describes a linear correlation, two a quadratic correlation, three a cubic correlation and four a quartic correlation. Higher values are unlikely to be found in the biological context. This exponent is called hill coefficent).
//Constants
let beta1 = 1.0 //mRNA production of hsf
let gamma1 = 0.8 //mRNA degradation of the hsf
let beta2 = 0.5 //protein production of HSF
let gamma2 = 0.35 //protein production of HSF
let beta3 = 2.0 //mRNA production of ssr
let gamma3 = 1.0 //mRNA degradation of ssr
let beta4 = 2.0 //protein production of SSR
let gamma4 = 0.8 //protein degradation of SSR
let gamma5 = 0.05 //removal of Na+
let K_d_na = 1.0 //dissociation of Na
let K_d_hsf = 0.8 //dissociation of HSF
let n1 =2 //hsf hill constant
let n2 = 3 //srr hill constant
//model
let modelFinal : Model =
fun P t ->
let mRNAcHSF = P[0]
let proteincHSF = P[1]
let mRNAcSSR = P[2]
let proteincSSR = P[3]
let NA = P[4]
let mRNAcHSF' = beta1 /(1.+(K_d_na/NA)**n1) - (gamma1 * mRNAcHSF)
let proteincHSF' = beta2 * mRNAcHSF - (gamma2 * proteincHSF)
let mRNAcSSR' = beta3 /(1.+(K_d_hsf/proteincHSF)**n2) - (gamma3 * mRNAcSSR)
let proteincSSR'= beta4 * mRNAcSSR - (gamma4 * proteincSSR)
let NA' = -gamma5 *proteincSSR
[| mRNAcHSF';proteincHSF';mRNAcSSR';proteincSSR';NA'|]
//Start concentrations
let P0 = [|
0.0
0.01
0.0
0.0
10.
|]
let simulationFinal =
modelContext.OdeInt(
0.,
P0,
modelFinal
)
|> SolPoints.take 50
|> SolPoints.memorize
[
let mRNAHSF = SolPoints.toPoints 1 simulationFinal
mRNAHSF
|> Chart.Spline
|> Chart.withTraceInfo("mRNA HSF1")
let proteinHSF = SolPoints.toPoints 2 simulationFinal
proteinHSF
|> Chart.Spline
|> Chart.withTraceInfo("Protein HSF1")
let mRNASSR = SolPoints.toPoints 3 simulationFinal
mRNASSR
|> Chart.Spline
|> Chart.withTraceInfo("mRNA SSR")
let proteinSSR = SolPoints.toPoints 4 simulationFinal
proteinSSR
|> Chart.Spline
|> Chart.withTraceInfo("Protein SSR")
let naion = SolPoints.toPoints 5 simulationFinal
naion
|> Chart.Spline
|> Chart.withTraceInfo("Natrium")
]
|> Chart.combine
|> Chart.withXAxisStyle("Time [s]", MinMax=(0,62))
|> Chart.withYAxisStyle("Concentration [μM]", MinMax=(0, 10))
This final model represents the regulation model proposed in the motivation section. However, it still has some noticeable flaws. The salt stress response in Chlamydomonas reinhardtii consists not only of one protein, but multiple. Thus, it would be required to split the differential equations for the SSR protein into multiple differential equations which describe their individual productions. Furthermore, the values for beta, and gamma are just guesses and not the reaction rates that are found in the real world. Biological systems modelling should always be orientated on the experimental findings, thus a general rule is to always use the experimentally determined reaction rates and only start to guess reaction rates if you do not find any already measured ones. It should also be noted that this blog post is just and introduction to modelling differential equations. It did not touch on repressor proteins, the constant protein production that most proteins have, autoregulation, and many more details. However, most of them play a role in salt stress reaction and need to be implemented as well. In case you want to model something yourself and don't know where to search for the missing pieces then try this resource (the course is done in python but the mathematical models are identical).